Bulk Insertions into R-Trees

نویسندگان

  • Li Chen
  • Rupesh Choubey
  • Elke A. Rundensteiner
چکیده

A lot of recent work has focussed on bulk loading of data into multidimensional index structures in order to eeciently construct such structures for large datasets. Previous work on bulk loading data focussed at building index structures from scratch, while the problem of bulk insertions into existing index structures has been largely overlooked. In this paper, we address this new problem with particular focus on R-trees { which are an important class of index structures used widely in commercial database systems. We propose a new technique, which as opposed to the current technique of inserting data one by one, bulk-inserts entire new datasets into an active R-tree. This technique, called STLT (for Small-Tree-Large-Tree), considers the new dataset as an R-tree itself (small tree), identiies and prepares a suitable location in the original R-tree (large tree) for insertion, and lastly performs the insert of the small tree into the large tree. In this paper, we present an analytical model of STLT. Extensive experimental studies both on synthetic and real data sets from the Sequoia 2000 storage benchmark are also reported. These experiments not only compare STLT against the conventional technique, but also evaluate the suitability and limitations of STLT under diierent conditions, such as varying buuer sizes, ratio between existing and new data sizes, and skewedness of new data with respect to the whole spatial region. We nd that STLT does especially well (up to 80% better than the existing technique) for skewed datasets as well for large ratios of large tree to small tree data insertion sizes, while consistently outperforming the alternate technique in all other circumstances. Our experimental results also indicate that the quality of the resulting tree constructed by STLT in terms of query performance is acceptable and in most cases preferable over that created by the traditional tree insertion approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GBI: A Generalized R-Tree Bulk-Insertion Strategy

A lot of recent work has studied strategies related to bulk loading of large data sets into multidimensional index structures In this paper we address the problem of bulk insertions into existing index struc tures with particular focus on R trees which are an important class of index structures used widely in commercial database systems We pro pose a new technique which as opposed to the curren...

متن کامل

Bulk Insertions into xBR ^+ -trees

Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk inserAQ1 tion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR-tree is a balan...

متن کامل

Bulk insertion for R-trees by seeded clustering

We propose a scalable technique called Seeded Clustering that allows us to maintain R-tree indices by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an Rtree for each of the clusters and insert the input R-trees into the target R-t...

متن کامل

The HeightBL Algorithm for Bulk-loading F-Onion-trees

The F-Onion-tree is a robust access method that slices the metric space into disjoint subspaces to provide quick indexing of complex data in the main memory. However, the F-Onion-tree only performs element-by-element insertions into its structure, i.e. it does not introduce a technique to build the index considering all elements of the dataset at once. In this article, we fill this gap. We prop...

متن کامل

The Dynamic Longest Increasing Subsequence Problem

In this paper, we construct a data structure using a forest of redblack trees to efficiently compute the longest increasing subsequence of a dynamically updated sequence. Our data structure supports a query for the longest increasing subsequence in O(r + logn) worst case time and supports inserts anywhere in the sequence in O (r logn/r) worst case time, where r is the length of the longest incr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998